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AMENDMENTS TO THE SPECIFICATION 

Please add the following new paragraphs [0037] through [0064] shown below. 

[0037] Figures lOA and lOB illustrate a method for clustering a string, the string 
including a plurality of characters, in accordance with one embodiment of the 
present invention. The method illustrated in Figures lOA and lOB includes the 
steps of: 

[0038] identifying R unique n-grams Ti...Rin the string (step 1005); 
[0039] for every unique n-gram Ts(step 1010): 

[0040] if the frequency of Ts in a set of n-gram statistics is not greater than a first 
threshold (step 1015: 

[0041] associating the string with a cluster associated with Ts (step 1020); 
[0042] otherwise: 

[0043] for every other n-gram Ty in the string Ti...r, excepts (step 1025): 
[0044] if the frequency of n-gram Ty is greater than the first threshold (step 1030): 
[0045] if the frequency of n-gram pair Ts-Ty is not greater than a second threshold 
(step 1050): 

[0046] associating the string with a cluster associated with the n-gram pair Tg-Ty 
(step 1055); 
[0047] otherwise: 

[0048] for every other n-gram Tx in the string Ti . r, except s and v (step 1 060): 

[0049] associating the string with a cluster associated with the n-gram triple Tg- 

Tv-Tx (step 1065); 

[0050] otherwise: 

[0051] do nothing (step 1035). 

[0052] Figures 11 A through IIC illustrate a method for clustering a string, the 
string including a plurality of characters, in accordance with another embodiment 
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of the present invention. The method illustrated in Figures UA through UC 

includes the steps of: 

[0053] identifying R unique n-grams Ti...Rin the string (step 1 105); 
[0054] for every unique n-gram Ts (step 1110): 

[0055] if the frequency of Ts in a set of n-gram statistics is not greater than a first 
threshold (step 1115): 

[0056] associating the string with a cluster associated with Ts (step 1 120); 

[0057] otherwise: 

[0058] for i = 1 to Y (step 1 135): 

[0059] for every unique set of i n-grams Tu in the string Ti...r, excepts (step 1 140): 
[0060] if the frequency of the n-gram set Ts-Tu is not greater than a second 
threshold (step 1 145): 

[0061] associating the string with a cluster associated with the n-gram set Ts-Tu 
(step 1150); 

[0062] if the string has not been associated with a cluster with this value of Ts 
(step 1125): 

[0063] for every unique set of Y+1 n-grams Tuy in the string Ti...r, except s (step 
1165): 

[0064] associating the string with a cluster associated with the Y+2 n-gram group 
Ts-TuY (step 1170). 

Please renumber original paragraph number [0037] to paragraph number [0065], 
as shown below. 

[0037] [0065] The text above describes one or more specific embodiments of a 
broader invention. The invention also is carried out in a variety of alternate 
embodiments and thus is not limited to those described here. For example, as 
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mentioned above, while the invention has been described here in terms of a DBMS 
that uses a massively parallel processing (MPP) architecture, other types of 
database systems, including those that use a symmetric multiprocessing (SMP) 
architecture, are also useful in carrying out the invention. The foregoing 
description of the preferred embodiment of the invention has been presented for 
the purposes of illustration and description. It is not intended to be exhaustive or 
to limit the invention to the precise form disclosed. Many modifications and 
variations are possible in light of the above teaching. It is intended that the scope 
of the invention be limited not by this detailed description, but rather by the claims 
appended hereto. 
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